Interrater reliability: the kappa statistic

نویسنده

Mary L. McHugh

چکیده

The kappa statistic is frequently used to test interrater reliability. The importance of rater reliability lies in the fact that it represents the extent to which the data collected in the study are correct representations of the variables measured. Measurement of the extent to which data collectors (raters) assign the same score to the same variable is called interrater reliability. While there have been a variety of methods to measure interrater reliability, traditionally it was measured as percent agreement, calculated as the number of agreement scores divided by the total number of scores. In 1960, Jacob Cohen critiqued use of percent agreement due to its inability to account for chance agreement. He introduced the Cohen's kappa, developed to account for the possibility that raters actually guess on at least some variables due to uncertainty. Like most correlation statistics, the kappa can range from -1 to +1. While the kappa is one of the most commonly used statistics to test interrater reliability, it has limitations. Judgments about what level of kappa should be acceptable for health research are questioned. Cohen's suggested interpretation may be too lenient for health related studies because it implies that a score as low as 0.41 might be acceptable. Kappa and percent agreement are compared, and levels for both kappa and percent agreement that should be demanded in healthcare studies are suggested.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Agreement, the f-measure, and reliability in information retrieval.

Information retrieval studies that involve searching the Internet or marking phrases usually lack a well-defined number of negative cases. This prevents the use of traditional interrater reliability metrics like the kappa statistic to assess the quality of expert-generated gold standards. Such studies often quantify system performance as precision, recall, and F-measure, or as agreement. It can...

متن کامل

Reliability of the visual assessment of cervical and lumbar lordosis: how good are we?

STUDY DESIGN Blinded test-retest design. OBJECTIVE To measure the intrarater and interrater reliability of the visual assessment of cervical and lumbar lordosis. SUMMARY OF BACKGROUND DATA Cervical and lumbar lordoses are frequently evaluated using visual assessment, but little attempt has previously been made to measure the reliability of visual assessment. METHODS Twenty-eight chiroprac...

متن کامل

Reliability of the NICMAN Scale: An Instrument to Assess the Quality of Acupuncture Administered in Clinical Trials

BACKGROUND The aim of this study was to examine the reliability of a scale to assess the methodological quality of acupuncture administered in clinical research. METHODS We invited 36 acupuncture researchers and postgraduate students to participate in the study. Firstly, participants rated two articles using the scale. Following this initial stage, modifications were made to scale items and t...

متن کامل

Comments on the article "can the ICF be used as a rehabilitation outcome measure? A study looking at the inter- and intra-rater reliability of ICF categories derived from an ADL assessment tool".

PURPOSE The categories of the International Classification of Functioning , Disability and Health (ICF) could potentially be used as components of outcome measures. Literature demonstrating the psychometric properties of ICF categories is limited. OBJECTIVE Determine the agreement and reliability of ICF activities of daily living category scores and compare these to agreement and reliability ...

متن کامل

Systemic lupus erythematosus disease activity index 2000 responder index-50 website.

OBJECTIVE To test the interrater and intrarater reliability of the Systemic Lupus Erythematosus Disease Activity Index 2000 (SLEDAI-2K) Responder Index (SRI-50), an index designed to measure ≥ 50% improvement in disease activity between visits in patients with systemic lupus erythematosus. METHODS This was a multicenter, cross-sectional study with raters from Canada, the United Kingdom, and A...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 22 شماره

صفحات -

تاریخ انتشار 2012

Interrater reliability: the kappa statistic

نویسنده

چکیده

منابع مشابه

Agreement, the f-measure, and reliability in information retrieval.

Reliability of the visual assessment of cervical and lumbar lordosis: how good are we?

Reliability of the NICMAN Scale: An Instrument to Assess the Quality of Acupuncture Administered in Clinical Trials

Comments on the article "can the ICF be used as a rehabilitation outcome measure? A study looking at the inter- and intra-rater reliability of ICF categories derived from an ADL assessment tool".

Systemic lupus erythematosus disease activity index 2000 responder index-50 website.

عنوان ژورنال:

اشتراک گذاری